308 ◾ Bioinformatics
-F 256 sam/ERR1823601.bam \
> sam/ERR1823601_unmapped.bam
samtools view \
-b -f 12 \
-F 256 sam/ERR1823608.bam \
> sam/ERR1823608_unmapped.bam
The “-f 12” option is used to extract only the unmapped forward and reverse reads and “-F
256” option is used to exclude secondary alignments. Refer to Chapter 2 for FLAG field of
the SAM/BAM file.
The above Samtools commands separate unmapped reads, which represent the
pure metagenomic data, in the separate BAM files “ERR1823587_unmapped.bam”,
“ERR1823601_unmapped.bam”, and “ERR1823608_unmapped.bam”.
8.2.3.5 Creating Paired-End FASTQ Files from BAM Files
Now, we can extract the FASTQ files from the above BAM files; we will extract two FASTQ
files from each BAM file. However, before doing that, we need to sort the BAM files by read
name using the “samtools sort” command with “-n” option, which sort the paired reads to
be next to each other.
samtools sort \
-n -m 5G \
-@ 2 sam/ERR1823587_unmapped.bam \
-o sam/ERR1823587_unmapped_sorted.bam
samtools sort \
-n -m 5G \
-@ 2 sam/ERR1823601_unmapped.bam \
-o sam/ERR1823601_unmapped_sorted.bam
samtools sort \
-n -m 5G \
-@ 2 sam/ERR1823608_unmapped.bam \
-o sam/ERR1823608_unmapped_sorted.bam
Then, we create FASTQ files from the BAM files and store them in a new directory “fastq_
pure” so that we can use them in the next steps of the downstream analysis.
Mkdir fastq_pure
samtools fastq -@ 4 sam/ERR1823587_unmapped_sorted.bam \
-1 fastq_pure/ERR1823587_pure_R1.fastq.gz \
-2 fastq_pure/ERR1823587_pure_R2.fastq.gz \
-0 /dev/null -s /dev/null -n
samtools fastq -@ 4 sam/ERR1823601_unmapped_sorted.bam \
-1 fastq_pure/ERR1823601_pure_R1.fastq.gz \
-2 fastq_pure/ERR1823601_pure_R2.fastq.gz \
-0 /dev/null -s /dev/null -n
samtools fastq -@ 4 sam/ERR1823608_unmapped_sorted.bam \